52 research outputs found

    A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

    Get PDF
    We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posteriorprior)\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior}) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P)L_2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with LL_\infty. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

    Mathematics Is Physics

    Full text link
    In this essay, I argue that mathematics is a natural science---just like physics, chemistry, or biology---and that this can explain the alleged "unreasonable" effectiveness of mathematics in the physical sciences. The main challenge for this view is to explain how mathematical theories can become increasingly abstract and develop their own internal structure, whilst still maintaining an appropriate empirical tether that can explain their later use in physics. In order to address this, I offer a theory of mathematical theory-building based on the idea that human knowledge has the structure of a scale-free network and that abstract mathematical theories arise from a repeated process of replacing strong analogies with new hubs in this network. This allows mathematics to be seen as the study of regularities, within regularities, within ..., within regularities of the natural world. Since mathematical theories are derived from the natural world, albeit at a much higher level of abstraction than most other scientific theories, it should come as no surprise that they so often show up in physics. This version of the essay contains an addendum responding to Slyvia Wenmackers' essay and comments that were made on the FQXi website.Comment: 15 pages, LaTeX. Second prize winner in 2015 FQXi Essay Contest (see http://fqxi.org/community/forum/topic/2364

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

    The No-Free-Lunch Theorems of Supervised Learning

    Get PDF
    The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification

    The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

    Get PDF
    The FLUXNET2015 dataset provides ecosystem-scale data on CO2, water, and energy exchange between the biosphere and the atmosphere, and other meteorological and biological measurements, from 212 sites around the globe (over 1500 site-years, up to and including year 2014). These sites, independently managed and operated, voluntarily contributed their data to create global datasets. Data were quality controlled and processed using uniform methods, to improve consistency and intercomparability across sites. The dataset is already being used in a number of applications, including ecophysiology studies, remote sensing studies, and development of ecosystem and Earth system models. FLUXNET2015 includes derived-data products, such as gap-filled time series, ecosystem respiration and photosynthetic uptake estimates, estimation of uncertainties, and metadata about the measurements, presented for the first time in this paper. In addition, 206 of these sites are for the first time distributed under a Creative Commons (CC-BY 4.0) license. This paper details this enhanced dataset and the processing methods, now made available as open-source codes, making the dataset more accessible, transparent, and reproducible.Peer reviewe

    Author Correction: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

    Get PDF
    corecore